dailybuzz.cc - data-scienceProbability and Statistics

Знакомство с R и базовая статистика

starstarstarstarstar_half

Статистическая обработка данных и визуализация результатов анализа - это неизбежный этап работы с данными, полученными в различных областях естественных наук, в социологии, психологии или экономике. В этом курсе мы подробно разберем основы статистики и познакомимся с основами языка статистического программирования R. Мы научим вас гибко использовать средства визуализации (диаграммы, графики и т.п.), чтобы сделать результаты анализа максимально доступными и понятными. Вы научитесь рассчитывать основные описательные статистики: медиану и квантили, среднее и стандартное отклонение. Вы познакомитесь с принципами использования теоретических распределений статистик для построения доверительных интервалов и тестирования гипотез (на примере t-критерия). Наконец, мы обсудим сложности, возникающие при множественном тестировании гипотез и научим вас преодолевать их. Этот курс для людей, начинающих знакомство со статистикой, а также для тех, кто хочет не только освоить базовые возможности языка R, но и научиться строить сложные графики.

Practical Time Series Analysis

starstarstarstarstar_half

Welcome to Practical Time Series Analysis! Many of us are "accidental" data analysts. We trained in the sciences, business, or engineering and then found ourselves confronted with data for which we have no formal analytic training. This course is designed for people with some technical competencies who would like more than a "cookbook" approach, but who still need to concentrate on the routine sorts of presentation and analysis that deepen the understanding of our professional topics. In practical Time Series Analysis we look at data sets that represent sequential information, such as stock prices, annual rainfall, sunspot activity, the price of agricultural products, and more. We look at several mathematical models that might be used to describe the processes which generate these types of data. We also look at graphical representations that provide insights into our data. Finally, we also learn how to make forecasts that say intelligent things about what we might expect in the future. Please take a few minutes to explore the course site. You will find video lectures with supporting written materials as well as quizzes to help emphasize important points. The language for the course is R, a free implementation of the S language. It is a professional environment and fairly easy to learn. You can discuss material from the course with your fellow learners. Please take a moment to introduce yourself! Time Series Analysis can take effort to learn- we have tried to present those ideas that are "mission critical" in a way where you understand enough of the math to fell satisfied while also being immediately productive. We hope you enjoy the class!

Линейная регрессия

star_border star_border star_border star_border star_border

В этом курсе мы разберем основные методы описания взаимосвязей между количественными признаками. Если корреляционный анализ позволяет количественно оценить силу и направление связи между двумя величинами, то построение регрессионных моделей дает более широкие возможности. При помощи регрессионного анализа можно количественно описывать поведение изучаемых величин в зависимости от переменных-предикторов и получать предсказания на новых данных. Вы узнаете, как строить простые и множественные линейные модели с использованием языка R. У всякого метода есть свои ограничения, поэтому мы поможем вам разобраться, в каких ситуациях можно, а в каких нельзя применять линейную регрессию, и научим вас методам диагностики подобранных моделей. Специальное место в курсе отводится глубинной анатомии регрессионного анализа: вы освоите операции с матрицами, которые лежат в основе линейной регрессии, чтобы получить возможность разбираться в более сложных разновидностях линейных моделей. Если вы сталкиваетесь с необходимостью поиска и описания взаимосвязей между теми или иными явлениями, которые могут быть измерены количественно, тогда этот курс - хорошая возможность понять, как устроены простая и множественная линейная регрессия, узнать о возможностях и ограничениях этих методов. Курс рассчитан на тех, кто уже знаком с базовыми приемами анализа данных с использованием языка R и с созданием простейших .html документов при помощи rmarkdown и knitr.

Causal Inference 2

star_border star_border star_border star_border star_border

This course offers a rigorous mathematical survey of advanced topics in causal inference at the Master’s level. Inferences about causation are of great importance in science, medicine, policy, and business. This course provides an introduction to the statistical literature on causal inference that has emerged in the last 35-40 years and that has revolutionized the way in which statisticians and applied researchers in many disciplines use data to make inferences about causal relationships. We will study advanced topics in causal inference, including mediation, principal stratification, longitudinal causal inference, regression discontinuity, interference, and fixed effects models.

Improving your statistical inferences

starstarstarstarstar_half

This course aims to help you to draw better statistical inferences from empirical research. First, we will discuss how to correctly interpret p-values, effect sizes, confidence intervals, Bayes Factors, and likelihood ratios, and how these statistics answer different questions you might be interested in. Then, you will learn how to design experiments where the false positive rate is controlled, and how to decide upon the sample size for your study, for example in order to achieve high statistical power. Subsequently, you will learn how to interpret evidence in the scientific literature given widespread publication bias, for example by learning about p-curve analysis. Finally, we will talk about how to do philosophy of science, theory construction, and cumulative science, including how to perform replication studies, why and how to pre-register your experiment, and how to share your results following Open Science principles. In practical, hands on assignments, you will learn how to simulate t-tests to learn which p-values you can expect, calculate likelihood ratio's and get an introduction the binomial Bayesian statistics, and learn about the positive predictive value which expresses the probability published research findings are true. We will experience the problems with optional stopping and learn how to prevent these problems by using sequential analyses. You will calculate effect sizes, see how confidence intervals work through simulations, and practice doing a-priori power analyses. Finally, you will learn how to examine whether the null hypothesis is true using equivalence testing and Bayesian statistics, and how to pre-register a study, and share your data on the Open Science Framework. All videos now have Chinese subtitles. More than 30.000 learners have enrolled so far! If you enjoyed this course, I can recommend following it up with me new course "Improving Your Statistical Questions"

Advanced Linear Models for Data Science 1: Least Squares

starstarstarstarstar_border

Welcome to the Advanced Linear Models for Data Science Class 1: Least Squares. This class is an introduction to least squares from a linear algebraic and mathematical perspective. Before beginning the class make sure that you have the following: - A basic understanding of linear algebra and multivariate calculus. - A basic understanding of statistics and regression models. - At least a little familiarity with proof based mathematics. - Basic knowledge of the R programming language. After taking this course, students will have a firm foundation in a linear algebraic treatment of regression modeling. This will greatly augment applied data scientists' general understanding of regression models.

Power and Sample Size for Multilevel and Longitudinal Study Designs

starstarstarstarstar_border

Power and Sample Size for Longitudinal and Multilevel Study Designs, a five-week, fully online course covers innovative, research-based power and sample size methods, and software for multilevel and longitudinal studies. The power and sample size methods and software taught in this course can be used for any health-related, or more generally, social science-related (e.g., educational research) application. All examples in the course videos are from real-world studies on behavioral and social science employing multilevel and longitudinal designs. The course philosophy is to focus on the conceptual knowledge to conduct power and sample size methods. The goal of the course is to teach and disseminate methods for accurate sample size choice, and ultimately, the creation of a power/sample size analysis for a relevant research study in your professional context. Power and sample size selection is one of the most important ethical questions researchers face. Interventional studies that are too large expose human volunteer research participants to possible, and needless, harm from research. Interventional studies that are too small will fail to reach their scientific objective, again bringing possible harm to research participants, without the possibility of concomitant gain from the increase in knowledge. For observational studies in which there are no possible harms to the participants, such as observational studies, proper power ensures good stewardship of both time and money. Most National Institutes of Health (NIH) study sections will only fund a grant if the grantee has written a compelling and accurate power and sample size analysis. The Institute of Education Sciences (IES), the statistics, research, and evaluation arm of the U.S. Department of Education, also offers competitive grants requiring a compelling and accurate power and sample size analysis (Goal 3: Efficacy and Replication and Goal 4: Effectiveness/Scale-Up). At the end of the online course, learners will be able to: • Use a framework and strategy for study planning • Write study aims as testable hypotheses • Describe a longitudinal and multilevel study design • Write a statistical analysis plan • Plan a sampling design for subgroups, e.g. racial and ethnic • Demonstrate the feasibility of recruitment • Describe expected missing data and dropout • Write a power and sample size analysis that is aligned with the planned statistical analysis This is a five-week intensive and interactive online course. We will use a mix of instructional videos, software demonstration videos, online discussion forums, online readings, quizzes, exercise assignments, and peer-review assignments. The final course project is a peer-reviewed research study you design for future power or sample size analysis.

Statistics with R Capstone

starstarstarstarstar_half

The capstone project will be an analysis using R that answers a specific scientific/business question provided by the course team. A large and complex dataset will be provided to learners and the analysis will require the application of a variety of methods and techniques introduced in the previous courses, including exploratory data analysis through data visualization and numerical summaries, statistical inference, and modeling as well as interpretations of these results in the context of the data and the research question. The analysis will implement both frequentist and Bayesian techniques and discuss in context of the data how these two approaches are similar and different, and what these differences mean for conclusions that can be drawn from the data. A sampling of the final projects will be featured on the Duke Statistical Science department website. Note: Only learners who have passed the four previous courses in the specialization are eligible to take the Capstone.

Basic Statistics

starstarstarstarstar_half

Understanding statistics is essential to understand research in the social and behavioral sciences. In this course you will learn the basics of statistics; not just how to calculate them, but also how to evaluate them. This course will also prepare you for the next course in the specialization - the course Inferential Statistics. In the first part of the course we will discuss methods of descriptive statistics. You will learn what cases and variables are and how you can compute measures of central tendency (mean, median and mode) and dispersion (standard deviation and variance). Next, we discuss how to assess relationships between variables, and we introduce the concepts correlation and regression. The second part of the course is concerned with the basics of probability: calculating probabilities, probability distributions and sampling distributions. You need to know about these things in order to understand how inferential statistics work. The third part of the course consists of an introduction to methods of inferential statistics - methods that help us decide whether the patterns we see in our data are strong enough to draw conclusions about the underlying population we are interested in. We will discuss confidence intervals and significance tests. You will not only learn about all these statistical concepts, you will also be trained to calculate and generate these statistics yourself using freely available statistical software.

Inferential Statistics

starstarstarstarstar_border

Inferential statistics are concerned with making inferences based on relations found in the sample, to relations in the population. Inferential statistics help us decide, for example, whether the differences between groups that we see in our data are strong enough to provide support for our hypothesis that group differences exist in general, in the entire population. We will start by considering the basic principles of significance testing: the sampling and test statistic distribution, p-value, significance level, power and type I and type II errors. Then we will consider a large number of statistical tests and techniques that help us make inferences for different types of data and different types of research designs. For each individual statistical test we will consider how it works, for what data and design it is appropriate and how results should be interpreted. You will also learn how to perform these tests using freely available software. For those who are already familiar with statistical testing: We will look at z-tests for 1 and 2 proportions, McNemar's test for dependent proportions, t-tests for 1 mean (paired differences) and 2 means, the Chi-square test for independence, Fisher’s exact test, simple regression (linear and exponential) and multiple regression (linear and logistic), one way and factorial analysis of variance, and non-parametric tests (Wilcoxon, Kruskal-Wallis, sign test, signed-rank test, runs test).

Prev 1 2 3 4 Next

FilterApply FilterReset Filter

Filter